Skip to main content

Internet Archive Web Crawls

The Internet Archive discovers and captures web pages through many different web crawls.



rss RSS

SHOW DETAILS
up-solid down-solid
Prior Page
eye
Title
Date Archived
Creator
Government Web & Data Archive
collection
6,269
ITEMS
9.2M
VIEWS
collection
eye 9.2M
This collaborative project is an extension of the 2016  End of Term  project, intended to document the federal government's web presence by archiving government websites and data. As part of this preservation effort, URLs supplied from partner institutions, as well as nominated by the public, will be crawled regularly to provide an on-going view of federal agencies' web and social media presence. Key partners on this effort are the Environmental Data & Governance...
Topics: government, data, federal, congress
Shallow Crawl Started November 2012
collection
137
ITEMS
8.3M
VIEWS
collection
eye 8.3M
Shallow crawl started November 2012 that collects content 1 level deep, including embeds. This data is currently not publicly accessible.
Live Web Proxy Crawls
web
eye 8M
favorite 0
comment 0
End of Term 2008 UNT Dot Gov Crawl
collection
282
ITEMS
8M
VIEWS
collection
eye 8M
End of term 2008 crawl of .gov domains gathered by University of North Texas . This data is currently not publicly accessible. UNT is a student-focused, public, research university located in Denton, Texas.
Live Web Proxy Crawls
web
eye 8M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 8M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 8M
favorite 0
comment 0
End Of Term 2016 UNT Crawls
collection
1,275
ITEMS
7.9M
VIEWS
collection
eye 7.9M
End of Term 2016 Web Archive government web crawls by project partner the University of North Texas.
Topics: end of term, federal government, 2016, president, congress, university of north texas
Live Web Proxy Crawls
web
eye 7.8M
favorite 0
comment 0
Hurricane Katrina
collection
112
ITEMS
7.8M
VIEWS
collection
eye 7.8M
Data related to Hurricane Katrina collected in 2005 by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Hurricane Katrina was the deadliest and most destructive Atlantic hurricane of the 2005 Atlantic hurricane season. It was the costliest natural disaster, as well as one of the five deadliest hurricanes, in the history of the United States. Among recorded Atlantic hurricanes, it was the sixth strongest overall. At least 1,833 people died in the hurricane and...
Live Web Proxy Crawls
web
eye 7.6M
favorite 0
comment 0
Data crawled by National Endowment for the Humanities and JISC on behalf of Internet Archive from Fri Aug 08 00:17:40 PDT 2008 to Thu Jun 26 05:29:33 PDT 2008
Topic: crawldata
Live Web Proxy Crawls
web
eye 7M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 6.7M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Mon Feb 12 21:42:38 PST 2018 to Mon Feb 12 15:20:34 PST 2018.
Topic: crawldata
Live Web Proxy Crawls
web
eye 6.5M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 6.1M
favorite 0
comment 0
Wide Crawl started February 2014
web
eye 6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl453.us.archive.org:wide from Wed Feb 19 01:09:37 PST 2014 to Tue Feb 18 21:33:27 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl420.us.archive.org:wide from Tue Feb 18 17:01:58 PST 2014 to Tue Feb 18 13:14:06 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 6M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl454.us.archive.org:wide from Wed Feb 19 05:20:19 PST 2014 to Wed Feb 19 01:54:33 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web
eye 5.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Wed Feb 19 09:49:01 PST 2014 to Wed Feb 19 06:07:15 PST 2014.
Topic: crawldata
Live Web Proxy Crawls
web
eye 5.9M
favorite 0
comment 0
Wide Crawl started January 2012
web
eye 5.9M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl423.us.archive.org:wide from Tue Jan 17 08:02:53 PST 2012 to Tue Jan 17 01:16:20 PST 2012.
Topic: crawldata
Wide Crawl started February 2014
web
eye 5.8M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Feb 19 07:58:38 PST 2014 to Wed Feb 19 05:13:46 PST 2014.
Topic: crawldata
Live Web Proxy Crawls
web
eye 5.8M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 5.6M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Mon Mar 28 12:43:47 PDT 2011 to Mon Mar 28 16:56:17 PDT 2011.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Fri Nov 01 06:23:33 PDT 2002 to Tue Nov 19 23:24:07 PDT 2002
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
Live Web Proxy Crawls
web
eye 5.4M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Sun May 8 07:07:52 PDT 2011 to Sun May 8 08:00:29 PDT 2011.
Topic: crawldata
Live Web Proxy Crawls
web
eye 5.3M
favorite 1
comment 0
Live Web Proxy Crawls
web
eye 5M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 5M
favorite 0
comment 0
2004 Indian Ocean earthquake and tsunami
collection
42
ITEMS
5M
VIEWS
collection
eye 5M
Data related to the 2004 Indian Ocean earthquake and tsunami collected by Internet Archive. This data is currently not publicly accessible.
Top 150 Crawl
collection
29
ITEMS
4.9M
VIEWS
collection
eye 4.9M
Top 150 Alexa sites crawl performed by Internet Archive. This data is currently not publicly accessible.
UNT Web
collection
35
ITEMS
4.8M
VIEWS
collection
eye 4.8M
This collection contains all collaborative crawl data contributed by University of North Texas (UNT).
Topics: UNT, web, texas, eot
Survey Crawl Number 8
web
eye 4.7M
favorite 0
comment 0
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl818.us.archive.org:survey from Fri Dec 14 21:00:08 PST 2018 to Fri Dec 14 22:34:06 PST 2018.
Topic: crawldata
Internet Archive crawldata from Survey Webwide Crawl, captured by crawl344.us.archive.org:survey from Thu Oct 12 08:48:34 PDT 2017 to Thu Oct 12 01:56:31 PDT 2017.
Topic: crawldata
Live Web Proxy Crawls
web
eye 4.6M
favorite 0
comment 0
UK Government Site Crawl
collection
107
ITEMS
4.6M
VIEWS
collection
eye 4.6M
Collaborative closure crawl of British government sites performed by Internet Archive. This data is currently not publicly accessible. from Wikipedia : GOV.UK is a United Kingdom public sector information website, created by the Government Digital Service to provide a single point of access to HM Government services.
Live Web Proxy Crawls
web
eye 4.6M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Tue Mar 29 00:12:24 PDT 2011 to Tue Mar 29 07:24:41 PDT 2011.
Topic: crawldata
Wide Crawl started February 2014
web
eye 4.3M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Wed Feb 19 08:18:23 PST 2014 to Wed Feb 19 04:21:37 PST 2014.
Topic: crawldata
Wayback Robots Crawl
collection
129
ITEMS
4.3M
VIEWS
collection
eye 4.3M
Wayback robots.txt crawl performed by Internet Archive. This data is currently not publicly accessible.
Wide Crawl started February 2014
web
eye 4.2M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Tue Feb 18 22:58:46 PST 2014 to Tue Feb 18 19:25:19 PST 2014.
Topic: crawldata
Live Web Proxy Crawls
web
eye 4.2M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Fri May 20 00:54:34 PDT 2011 to Fri May 20 04:55:47 PDT 2011.
Topic: crawldata
Live Web Proxy Crawls
web
eye 4.2M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Thu May 19 17:19:06 PDT 2011 to Thu May 19 17:46:28 PDT 2011.
Topic: crawldata
End Of Term 2016 Library of Congress Crawls
collection
3,892
ITEMS
4.1M
VIEWS
collection
eye 4.1M
End of Term 2016 Web Archive government web crawls by project partner the Library of Congress.
Topics: end of term, federal government, 2016, president, congress, library of congress, web, data, library...
Live Web Proxy Crawls
web
eye 4M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Wed Mar 30 03:11:49 PDT 2011 to Wed Mar 30 10:24:04 PDT 2011.
Topic: crawldata
Live Web Proxy Crawls
web
eye 3.8M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2013-01-09T09:11:28 UTC to 2013-01-09T20:54:17 UTC.
Topic: crawldata
End of Term 2016 Post-Inauguration Crawls
collection
5,308
ITEMS
3.8M
VIEWS
collection
eye 3.8M
This collection contains web crawls performed as the post-inauguration crawl for part of the End of Term Web Archive, a collaborative project that aims to preserve the U.S. federal government web presence at each change of administration. Content includes publicly-accessible government websites hosted on .gov, .mil, and relevant non-.gov domains, as well as government social media materials. The web archiving was performed in the Winter of 2016  and Spring of 2017 to capture websites...
Topics: end of term, federal government, 2016, president, congress
Wide Crawl started June 2014
web
eye 3.7M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl429.us.archive.org:wide from Thu Jul 10 07:24:15 PDT 2014 to Thu Jul 10 01:45:52 PDT 2014.
Topic: crawldata
Live Web Proxy Crawls
web
eye 3.7M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2013-01-09T16:00:53 UTC to 2013-01-10T01:01:51 UTC.
Topic: crawldata
Wide Crawl started June 2014
web
eye 3.7M
favorite 0
comment 0
Internet Archive crawldata from Webwide Crawl, captured by crawl421.us.archive.org:wide from Thu Jul 10 06:43:41 PDT 2014 to Thu Jul 10 01:23:01 PDT 2014.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Fri Sep 26 12:13:03 PDT 2003 to Wed Dec 10 23:25:51 PDT 2003
Topic: crawldata
Live Web Proxy Crawls
web
eye 3.6M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Tue Mar 29 14:27:39 PDT 2011 to Tue Mar 29 20:01:59 PDT 2011.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sat Sep 18 12:46:38 PDT 2004 to Thu May 05 09:34:36 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sun Nov 16 11:09:15 PDT 2003 to Tue Jun 17 19:02:43 PDT 2003
Topic: crawldata
Live Web Proxy Crawls
web
eye 3.4M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live1.us.archive.org from 2013-03-29T09:54:55 UTC to 2013-03-29T13:00:26 UTC.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sun Aug 14 21:06:52 PDT 2005 to Sun Nov 20 23:12:32 PDT 2005
Topic: crawldata
Live Web Proxy Crawls
web
eye 3.4M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live3.us.archive.org from 2013-10-14T07:53:13 UTC to 2013-10-18T09:13:35 UTC.
Topic: crawldata
Live Web Proxy Crawls
web
eye 3.3M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Thu Mar 31 04:44:35 PDT 2011 to Thu Mar 31 13:13:56 PDT 2011.
Topic: crawldata
Live Web Proxy Crawls
web
eye 3.2M
favorite 0
comment 0
Live Web Proxy Crawls
web
eye 3.2M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live0.us.archive.org from 2013-03-29T09:35:58 UTC to 2013-03-29T15:03:34 UTC.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Thu Nov 27 00:45:40 PDT 2003 to Sat Mar 29 12:46:47 PDT 2003
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Wed May 04 22:04:03 PDT 2005 to Tue Oct 25 16:31:50 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Mon Aug 02 08:33:20 PDT 2004 to Thu Jun 02 08:07:17 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Mon Jun 16 20:25:23 PDT 2003 to Wed Nov 26 17:48:35 PDT 2003
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sat Mar 29 18:35:25 PDT 2003 to Sun Nov 16 08:00:57 PDT 2003
Topic: crawldata
Live Web Proxy Crawls
web
eye 3.2M
favorite 0
comment 0
Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Fri May 20 12:11:05 PDT 2011 to Fri May 20 16:12:31 PDT 2011.
Topic: crawldata
Live Web Proxy Crawls
web
eye 3.2M
favorite 0
comment 0
Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-live3.us.archive.org from 2013-07-11T00:40:27 UTC to 2013-07-11T05:26:31 UTC.
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Tue Mar 08 20:18:06 PDT 2005 to Tue Sep 27 05:06:47 PDT 2005
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Sat Mar 05 13:09:57 PDT 2005 to Sat Oct 29 13:54:47 PDT 2005
Topic: crawldata
Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Mon May 13 17:55:38 PDT 2019 to Mon May 13 13:45:55 PDT 2019.
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Thu Jun 19 05:30:16 PDT 2003 to Wed Nov 26 23:13:24 PDT 2003
Topic: crawldata
Data crawled by Internet Archive on behalf of Internet Archive from Wed May 03 07:47:56 PDT 2006 to Fri Sep 17 05:50:57 PDT 2004
Topic: crawldata
Live Web Proxy Crawls
web
eye 3M
favorite 0
comment 0
Data crawled by Internet Archive on behalf of Internet Archive from Tue Jun 17 21:04:33 PDT 2003 to Sat Nov 29 14:39:46 PDT 2003
Topic: crawldata